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1. Introduction 

Quantum information theory and application areas based upon it, such as quan- 
tum computing or quantum cryptography have attracted a massive research interest 
and are developing rapidly. Quantum computing and quantum cryptography are 
primarily concerned with the impact that the nature of physical quantum systems 
has on the respective fields of computing and cryptography. Using physical systems, 
one has to obey the restrictions imposed by nature on physical quantum states, for 
example that they are not directly observable. They can be measured, but in gen- 
eral a measurement only reveals parts of the information contained in the quantum 
state. 

Some authors have applied quantum information theory to statistics and prob- 
ability theory ([2111]) as well as signal and image processing applications ([S1[S]). 
Their work focuses mainly on the abstract mathematical concept of quantum in- 
formation theory and might be referred to as Quantum Information processing 
Algorithms (QIA) in analogy to the term Quantum Signal Processing created by 
Eldar ([8]). Applying the formalism of quantum information theory to information 
processing on classical computers can result in novel but still 'classical' algorithms. 

In this paper we follow the QIA approach, motivated by the fact that we have 
found considerably easier descriptions of well suited algorithmic solutions for the 
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problems at hand in several application areas ( j2Ql |23j |2T| ) within the mathematical 
setting of the quantum space. One reason might be that we were forced to think 
more deeply about the normalization of data and observations and the metrical 
functions to be used. A crucial factor seems to be the ability to represent some 
relations between two (or more) sources of information jointly as quantum infor- 
mation, i.e. in a quantum state. In analogy to the behaviour of physical conjugate 
variables, e.g. position and momentum, which play a fundamental role in quantum 
mechanics, we term this a conjugate information variable. In essence we refer to 
two sources of information as being conjugate to each other if they are in a special 
relation: Whenever one of them becomes highly predictable the other one is either 
undefined or unpredictable and vice versa. 

The special relation between physical observations of conjugate quantities was 
an important factor in the development of quantum mechanics in the early 20 th 
century. In the quantum mechanical setting, Heisenberg's uncertainty principle 
expresses this relation, e.g. between momentum and position. To be more precise, 
momentum and position are represented as operators in quantum mechanics and 
they are related to each other by derivatives. Here an exact definition of uncertainty 
means that these operators do not commute. 

Our main justification for using the same mathematical structure, i.e. Hilbcrt 
spaces, is that information sources exist, which have a similar conjugate relation. 
As an illustrative example consider we want to observe two species of birds lets 
say one is blue the other one green and their chirp is quite distinct but cannot be 
reliably heard during day time due to background noises which are absent at night. 
Obviously during the day we use our eyes and at night our ears to distinguish the 
species. But is there a seamless way to fuse both signals that additionally helps us 
to distinguish them at sunrise or sunset with higher reliability? 

Or more seriously, consider a simple classification task where we want to decide 
whether parts of a function are locally constant, rising, falling or minimum or max- 
imum. We might want to use the derivatives of the function to achieve this, as 
we know that when the first derivative becomes zero the entire information on the 
extreme points of the function relies on the sign of the second (or third...) deriva- 
tive. On the other hand when the second derivative becomes zero, the information 
whether we found a point of (rising or falling) inflection or the function is constant 
at this point depends entirely on the first (or third...) derivative. Again it would 
be desirable to represent the first and second derivative such that in 'grey areas' 
in between the two extremes either one alone (if it is more reliable) or both may 
contribute to the classification result. 

The authors are fully aware that most of the material on theoretical quantum 
information is already covered elsewhere. In particular we refer to the textbooks 
of Gruska [T^] and Nielsen and Chuang Q35]. Nonetheless we find it worthwhile to 
summarize some of the basic ideas behind quantum information theory in order to 
help readers who are not too familiar with them and the notation used in this area. 
We will omit reoccurring references to [321 US] , where most of the material is treated 
in-depth, and indicate when the result can be found elsewhere. This introduction to 
quantum information theory is contained in Chapter [2] whereas Chapter [3] analyzes 
the encoding of information into quantum states. In Chapter [4] we take a closer 
look on two-dimensional systems. A new approach to signal segmentation is given 
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in Chapter [5j We conclude the paper with some discussion and references to future 
work (Chapters [6j [7]) . 

The fact that in most parts of this paper we consider random variables and 
distributions rather than signals reflects the fact that we assume the signals to be 
normalized. In quantum states/signals the same assumption is usually made. 

2. An Introduction to Quantum Information Theory 

2.1. Quantum Systems, Quantum States and Qubits. The first postulate of 
quantum mechanics states that any physical system can be described by a unit 
vector in an associated complex vector space which is called Hilbert space T~L. A 
given unit vector ip is called the state of the system. In this paper we restrict 
ourselves to finite dimensional Hilbert spaces. The simplest non-trivial system is 
a two-dimensional system with the state space H — C 2 . Such systems are called 
quantum bits or qubits. In the usual mathematical notation, the state of a qubit 
can thus be written as the vector 

(2.1) ip= f ^ J =ae + /3e 1 with a,/3eC and \a\ 2 + \(3\ 2 = 1, 

where the basis vectors eo and ei are orthonormal unit vectors. In quantum com- 
puting, it is customary to use the Dirac (or "bra-ket") notation: The column "kef- 
vectors 

(2.2) |0> = eo=(j) and |1) = e x = ( \ 

or generally \i) = e, for higher dimensional cases, form the computational or canon- 
ical basis of % and the above state can be written as a superposition 

(2.3) |V>=a|O)+0|l). 

The dual "bra"- vector (ip\ corresponds to the associated row- vector (a*, /?*) with 
complex conjugate components, thus 

(2.4) (ip\=a* (0|+/T (1|. 

The scalar product (ip\x) and the outer product (x\ are thus reduced to 
matrix multiplications. Any linear operator A : H — > H over H = C n can be 
written as 

n-l 

(2.5) A = ^a i , i |i)(j|. 

For a Hermitian operator A = A^ holds, where in matrix notation A^ is the trans- 
pose of A with conjugate complex entries. Whenever we omit the index bounds, 
the sum is from 0, . . . , n — 1, where n is the dimension of the considered Hilbert 
space. 

The orthonormality and completeness of the computational basis can be written 

as 

(2.6) (i\3)=Sij and ^ \i) (i\ = I, 

i 

where <Jy denotes the Kronecker delta function and I is the identity. Table [T] sum- 
marizes frequently-used Dirac notation and its meaning. 
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Notation 


Description 




general "ket" vector, e.g. \ip) — (c , ci, . . .) T 




dual "bra" vector to e.g. (V> = (c* ,c\, . . .) 


|n) 


n th basis vector of computational basis N — {|0) , |1) , . . .} 


m) 


inner product of \<p) and \ip) (scalar) 


\4>) M 


outer product of \<p) and (matrix) 


\4>) W) 


tensor product \<p) (g) \tp) (vector) 


\hj) 


tensor product \i) \j) of the basis vectors \i) and \j) 



Table 1. Dirac ("bra-ket") Notation 



2.2. Measurement, von Neumann Measurement. From a physical point of 
view any information contained in a quantum state \ip) can only be accessed via 
a measurement which is described by a set of (hermitian) operators adding up to 
the identity operator (completeness condition): 

m— 1 

(2.7) ]T M, = I. 

i=0 

As a result of a measurement on l^) we get an index i 6 {0, . . . , m — 1} = S with 
some probability. A standard interpretation of a measurement is that the state 
'collapses' onto the post-measurement state —k=Mi lip) with probability p(i) = 

\ Mi It should be noted that the idea that the quantum state spontaneously 
is changed via a measurement has been the subject of much debate ever since it 
was proposed in the 1950s. We do not want to address this issue, as the material 
discussed here is unaffected by it. The fact that the only direct information we 
can gain from a physical quantum system is the outcome i of the measurement is 
undisputed. Reapplying the measurement does not change this result any further. 
This means in particular that we cannot gain any direct information about the 
co-ordinates of a quantum state. 

However if we are given a large number of 'identical' quantum states and we 
measure them one by one, we eventually gain the frequency of each index and after 
proper normalization the probability information p(i) for all i G S. Taking the 
completeness condition |2.7[ ) into account we see that the probabilities properly 
add up to one: (tp\ M,) \ip) = (ip\I\ip) = 1. In slightly different terms we 
get the probability distribution of a discrete random variable over the given index 
set P{X — i) — p(i). The values the random variable can tak^j] and the index 
of the measurement operators are in a one to one correspondence. The resulting 
probability distribution is fixed as soon as we are given the state and the set 
of measurement operators M = {M™ g 1 }. In the next section we will see that the 
converse in general is not true. That is, a given probability distribution does not 
uniquely determine a quantum state and/or a set of measurement operators. 

In this paper we will restrict ourselves to a particulary simple form of measure- 
ment which is described by projectors, the so-called von Neumann measurement. 



the elements of the a algebra. 
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Given a unit vector the projector P onto takes the form 

(2.8) P = |V) (VI • 

Applying P — (V|V) = 1 ■ |V) we see that it projects \tp) onto itself and any 
other unit vector \x) is projected onto: (ip\x) — (Vlx) IV 1 ); i- e - m the direction 
of \ip) times the scalar (ip\x)- K P = I — P is the projector onto the orthogonal 
complement space of P, {PjP -1 } forms a measurement. For any state |%) the 
probability that it is changed into |V) via this measurement is 

(2.9) p (\ X ) -> |V» = (xl P IX) = <X|V> (Vlx> = Mx)\ 2 , 
whereas 

(2-10) (xl P X |x) = (Xl (I - P) |X) = 1 - I(V|X)| 2 



gives the probability that the post measurement state lies in the orthogonal com- 
plement space. p(\x) IV)) = KVlx)| 2 is called the transition probability. In the 
next section we will analyze these equations further. 



2.3. Operator Space, Density Operator. The quantum state space under con- 
sideration is usually called the principal system. The operator space over a principal 
system is given by all linear mappings of this state space onto itself, i.e. given the 
state space H the operator space is H (§5 H = . The inner product in the operator 
space takes the form of the Hilbert- Schmidt inner product: 

(2.11) (A; B) = tr ( A^b) ■ A,B eU®U, 



where the trace of an operator is the sum of the diagonal elements of the operator 
written as a matrix. In terms of the Dirac notation the trace of an operator may 
be written as 

tr(A) = tr (^ayKXil ] =^(k\ J l*> 



(2.12) = ^a fc , fe . 

k 

Equation |2.11| enables us to define a norm on Lu : 

t / \ f \ 



|A|| 2 = ir(A f A) = tr 



V 



tr ( E a iJ I- 7 ') WE 81 '" 1 ^ { 
tr E a i,j a «.™ I-?) H = E a i.j ai 'J- 



Hi 



(2.13) ||A| 




(> 
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(2.13) is called the Frobenius norm or Hilbert- Schmidt norm of Consequently, 



for operators A, B £ L-u we may calculate the Euclidian distance || A — Bj 



A - B|| = J2(al 3 - bWiaij - b id ) = t V [k„f + \b^ - 23? (a^)] 





(2.14) = J[||A|| 2 + ||B|| 2 -2mr (AB f ) 



where Sft gives the real part of a complex number. 

Finally we would like to emphasis the well known fact that global phase changes 
of a state do not affect the expectation value of an operator. 

Lemma 2.1 (Global phase invariance of the expectation of linear operators). Let 
A £ Lf-i be a linear operator. Then the expectation of A is invariant with respect to 
global phase changes, i.e. for any \tp) = e tlfi \ip), ~ir < ip < w, \tp) £ C": 

^\A\^) = ^\A\^). 

This is easily verified by 

Proof. $\A\i)) = e^f Ae^ \ifj) = e~^+^ (^>| A = (tp\ A \ip) . □ 

After these general remarks on the operator space we will turn our attention to 
a special subset of operators, the so called density operators. Lets first consider the 
single qubit state 

(2.15) | +) = _L|„ ) + _L| 1) . 

When we measure |+) by projectors onto the computational basis {|0) (0| , |1) (1|}, 
we will find that either of the two possible post-measurement states |0) and |1) are 
equally likely with a transition probability of po — pi — \. So the experiment is 
equivalent to a classical flip of an unbiased coin and we might be tempted to think 
that |+) in fact describes a state where the system state is unknown which is either 
|0) or |1) with equal probability. 

If, however, we measure via {|+) (+| , |— ) (— |} with 

(2.16) |_ ) = _L| 0) --L|1), 

on the same state |+), then the outcome is deterministic as |+) is an eigenvector of 
|+) (+|. Obviously, the non-determinism of a quantum measurement is inherently 
different from the classical randomness of (un)biased coin flips. 

There exists an elegant and consistent way to capture both concepts of random- 
ness: Assuming we know that a quantum system is in one of k states \ipi) with 
the probabilities pi, then we can aggregate our knowledge of the state in a (semi) 
positive density operator 

k k 

(2.17) P = IV>;> I with J>(z) = l. 

i=l i=l 

As the probabilities add up to 1 and by the linearity of the trace, it is assured 
that txp = tr (J2i =1 Pi \ipi) (^»|) = J2t=iPi tT (i^il = J2i=iPi = 1 where we 



used Eq. 2.12 If tr p 2 = 1 then p can be written in the form p = and the 
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system is said to be in a pure state. If tr p 2 < 1, then p is referred to a mixed state. 
The transition probability p(p — ¥ x) to reduce p (pure or mixed state) to the pure 
state p' = \x) (xl when performing a measurement in a basis containing \x) is the 
expectation value 

(2-18) p{p^X) = (p) y = (x\p\x)- 



Note that for pure states, (2.18) is equivalent to ( |2.9[ ), as we would expect. 
There are some properties of density operators that should be mentioned: 

• tr p = 1 (see above); 

• p — p\ i.e. density operators are hermitian operators; 

• any density operator has a spectral decomposition, p = UDU^, where 
U is a unitary matrix, i.e. U^U = I, and D is a diagonal matrix with 
nonnegative real diagonal elements. Therefore, p = 'Y^ li d(i) \ui) (ui\, with 
Y^j d(i) = 1, where \ui) is the basis generated by the columns of U and d{ 
are the diagonal entries of D. The \ui) are called the eigenbasis of p; 

• consider two density operators in their eigenbasis representation: 

P = Y,iP{i) \4>i) (4>i\ with X«-PW = 1 and °" = \Xi) (Xi\ wi th 

E 4 <?0) = 1, then tr (pa) = YlijPi^lU) K^ilXj)] • This implies, as p u q 3 
and KV'ilXi)! 2 are nonnegative and real numbers, that tr (pa) — Sfttr (pa); 

• < tr (pa) < 1. The lower bound follows again from the fact that the 
sum is over nonnegative and real numbers. To verify the upper bound we 
substitute KV'ilXi)] — 1) which gives 

(2.19) < tr (pa) = J^PiMJ) KV'ilXi)! 2 < Y,p(i)qV) = 1- 

The minimum error discrimination of mixed quantum states is an interesting and 
well studied problem. We refer the interested reader to pioneering work of [T31 [TS] 
and further results in [tJJ [H] and the literature cited therein. 

3. Encoding classical Information as Quantum Information 



In Section 2.2 we have seen that given a set of measurement operators M = 
{M™^ 1 } and a quantum system \ip) uniquely determines a probability distribution 
P(X = i) = p(i), i = 0, . . . ,m — 1. In this section we want to analyze the converse, 
i.e. given a probability distribution we want to determine quantum states and 
measurement operators that produce it. 

Let us consider a random variable X. The set of values X can take is given by 
S and for any probability distribution 

(3.1) P( X = x ) = 1 > P( X =x)>0, for all x e S. 

A given probability distribution may be interpreted as vector from the n = \S\ 
dimensional real vector space M™ . In this case the basis of the space can be indexed 
by the elements of S and \x) ,x € S denotes a basis vector. In this basis we define 
p(x\ = \p(X)) — ^2 xe g p(x) \x) . Due to the linearity of the normalization condition 
(3.1 ) the set of all probability distributions forms an-1 dimensional hyper-plane, 
sometimes called simplex, in this space. 

As we would like to make a connection between the set of all probability dis- 
tributions in K™ and quantum states from a complex Hilbert space C™ = H, we 
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formerly introduce complex numbers a Vx € C such that 
(3.2) p(x) = < 



2 



The choice of a Vx is not unique. Using Euler's Equation on 

a <Px = \/p{x)e lVl = \/p(x)(cOS lfl x + ISm^), — 7T < lf x < 7T, 



Equation (3.2) can be rewritten as 

(3.3) p{x) = a* tpx a Vx = p(x) cos 2 <p x + p{x) sin 2 <p x 

It is noteworthy that all numbers a Vx fu lfilling ( 3.2 ) lie on a complex sphere with 



radius y /t p(x) as can be verified from (3.3) 



Already here we see that encoding probabilities as quantum states requires some 
more information than only the probability distribution if we want to be able to 
adress all quantum state^J If we restrict the encoding onto the real nonnegative 
square root we gain the same quantum state as described in 
Consider the (state) vector \tp) e C": 



(3-4) |V) = V \x 



The set of all projectors onto basis states, {P^) = \x) (x\ , x € S}, taken as measure- 
ment operators result in the probability distribution p(X) when is measured, 



and fulfill the completeness condition (2.7) 



Using (3.3) and the normalization condition (3.1 ) we see that \ip) is a unit vector 
in C™. These unit vectors are related to probability distributions via (3.2) and lie 
on a complex unit hyper-sphere centered at the origin. 

We will call the assignment 

(3.5) E(p(X = x),C)^a Vx 

quantum encoding of p(X) with respect to the conjugate information C. If we want 
to decode the probability information of the quantum encoding we have to apply 
{P [x) =\x) (ar|,xG5} to |^). 

3.1. Distances. When given two probability distributions over the same index set 
a natural and important question is: how similar or how close are they? The 
answers to this question can be quite diverse depending on the scientific discipline 
they have been derived from. Table[2]gives an overview on the more frequently used 
probability distribution distance measures as they can be found in the literature. 

Likewise we find adapted distance measures for quantum states and/or density 
operators (0). Based on the transition probability of quantum states we may 
define a distance function: 

Lemma 3.1. Let p,o G L-u be density operators, (tr p = trer = 1). Then the 
function 

D(p,cr) = v 7 ! - tr {pa) 
satisfies the following properties: 
(1) 0<D(p,a) < 1, 



2 To be more precise: we need to know the meaning of '— ^/p(x)'. This cannot be derived from 
the probability distribution alone. One bit of additional information is required. If this bit of 
information is available, the complex part insures a smooth transition from ^p(x) to — ^/p(x), 
i.e. without changing the probability of \x) if measured in the same basis. 
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Name 


Formula 


Metric 


Euclidian distance 


VEi(p«-<?«) 2 


yes 


trace or 
variational distance (1171) 


E. W)-q(i)\ 


yes 


relative \ 2 


2(p(i)+q(i)) 


X yes 


relative entropy or Kullback- 
Leibler divergence ([TBI) 


-H{p) -Ei-P(*) 1 °g'7(*) 


no 


Jensen-Shannon divergence or 
symmetric relative entropy (1171) 


JS(p,q) = 

H(±(p + q))-i(H(p)+H(q)) 


yes 


Bhattacharyya distance f [Tl HB]) 


1 - E* VW)VW) 


yes 



Table 2. Frequently used probability distribution distance mea- 
sures, p and q are probability distributions over the same index 
set. H(p) — — Ei-PW l°gp(*) * s ^e Shannon entropy (fll IT01 12] ). 



(2) D(p,cr) = D(a,p), (symmetry) 

(3) /or any density operator r G L^,trr = 1: 

D(p,cr) < D(p,r) + D(t,o~), (triangle inequality) 

(4) on pure states D(., .) defines a metric. 

The fact that Z? defines a metric on pure states is well known ([JJ5], p. 500). In 
the literature it is sometimes called no-name metric and it is closely related to the 
so called Fubini study metric. Nevertheless, we would like to prove Lemma 3.1 as 



the proof offers some insight into the relation of D and the Euclidian metric of the 
operator space and the principal system. 



Proof. Property (1) of Lemma 3.1 is a direct consequence of Equation (2.19). Prop- 
erty (2) follows from the fact that the trace of an operator product is invariant 
against cyclic permutations of the arguments, i.e. tr (per) — tr(ap). 

Let p,cr,T G L-u be density operators, i.e. trp = tro" = trr = 1 and \\p\\ < 
1 , || c- 1 1 < 1, ||r|| < 1. To see property (3) we first note that the Euclidian dis- 
tance (see Equation 2.111 of the operator space fulfills the triangle inequality for 

+ || t — a\\ holds. We calculate 

ll\2 



all operators in L% , i.e. 



\\P\\ 



\a\\ 2 - 2tr (per) 



\P - 
< 
< 



oil < 



IP - 



(Hp- 



wp-n 
\\p\\ 2 + 

2\\p-7 



np-r\\ 



2tr (pr) 



\r\\ 2 + 



T — (J 
2 



cr|| 2 - 2tr (to-) + 



where we used Equation 2.14 in the last step. Simplifying and expanding the mixed 
term gives: 

|| 2 -tr (pr)-tr (to) + 



- tr (pa) 

By substituting ||p| 
(3.6) 



v/[|HP + W ~ 2tr (pr)} \M 2 + IMP - 2tr (raj] 



\t\\ 2 < 2 and ||r|| 



|o"|| 2 < 2 we get 



tr (per) < ||r|| 2 - tr (pr) - tr (to) + 

+ 2^(1 - tr (pr)) (1 - tr (ra)). 
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With this we are ready to prove property (3) of the above Lemma. We substitute 
|| t || 2 < 1 in (3.6) and add 1 to both sides. Then 



D(p, a) 2 = 1 - tr (per) < 2 - tr (pr) - tr {to) 



+ 2^(1 - tr (pr)) (1 - tr (t<t)) 
= (0 - tr (pr) + v 7 ! - tr (t<t)) 
= (£>(p,r) + D(r,a)) 2 . 

Reading top to bottom we see that the triangle inequality holds for D(.,.) and all 
p,a,r € 

(3.7) D(p,(7)<i?(p,T)+I>(T s CT). 

For property (4) we have to verify that D(p, p) = 0. This is only true for pure 
states as in general tr p 2 < 1 with equality if and only if p is pure. On the other 
hand for pure states D(.,.) fulfills all the requirements of a metric. □ 

Indeed, for pure states D is identical to the Euclidian norm of the operator space 
up to a constant factor, which is easy to verify by noting that for any pure state 



\tp) and any density operator a the following holds (see. Eq. 2.12| 
(3.8) tor(\1))(il>\v) = (il>\a\1>). 

If er is pure as well, i.e a = | 



|(V>|£)| ■ With this and equation 



2.14 



we get tr{\i>)l^\o) = tr(\ip) (CI) - 
the distance computes to || — |£) (£| || = 

V2\fl - \(ip\0\ 2 and D (\ip) (ip\ , |£) = \J 1 - |(^I0| 2 - For convenience we will 
write D(\ip) , = D (\ip) , |£) (£|) for short when dealing with pure states only. 
On pure states the no-name metric takes therefore the form 

(3.9) W),|0) = Vl- I W) I 2 , with M=5>¥>.l*>»IO = £p\J a: >- 

X X 

One may derive an intuitive geometric argument to compare two state vectors in 
C" (see Section 3.2 and Figure [IJ. The vectors and |£) in general span a two- 
dimensional complex plane and the intersection of this plane with the hyper-sphere 
generates a complex unit circle. If = |£) the intersection degenerates into two 
points. 

Remark (a word of care): Given two probability distributions of the same random 
variable S way may construct corresponding (diagonal) density operators by: p — 
J2x£S p( x ) \ x ) i x \' a ~ J2x£S l( x ) \ x ) ( x \- Their Euclidian operator distance ||p — 
cr|| 2 = tr(p 2 ) +tr(a 2 ) — 2tr(pa) is equal to the Euclidian distance of the probability 
distributions. But \\p — cr|| 2 ^ 2(1 — tr(pa)) = 2D(p, a) 2 unless both distributions 
are pure, i.e. for both probability distributions there exists one event, lets say 
x p ,x q , such that p(x p ) = q(x q ) = 1. 

For the rest of the chapter we will assume pure states only. 



x ,a u 



3.1.1. Projection, transition probability. Let \ip) = 
and |C) = YlxeS fix* \ x ) >Px* = \/ l{ x ) e%X: * ■ As we have seen in Section 



tion (|2.8|) the projector P onto is given by P = 



2~2l Equa- 



and the transition 



probability 
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gives the square of the length of the projection of |0 onto Substituting ^ x 
Vx — Xx an d using Euler's Equation, we get: 



<m = E vW)e iT * 

xGS 

(3.10) = E cos 7 a y/ p(x)q(x) + z ^ sin 7^ a/ p(x)q(a 



xes xes 



and finally 

I <£IV>) I 2 



Vies / Vices / 

E cos(7 x - J v )y / p{x)p(y)q(x)q(y), 



x,y&S 

(3.11) = E cos (7x - iy)Vp(x)p(y)q(x)q(y), 

x£S x,y^x£S 

where we used cos(7 x ~ l y ) — cos 7a; cos7 a + sin 7a; sin7 y and note that all mixed 
terms (±icos7a; sin 7,, . . .) disappear. 

3.1.2. Length, Euclidian distance of the principal system. It is useful to compute 
the Euclidian distance of the principal system as well in order to compare it with 
the metric D. This is given by the length of the difference HI?/') — 



ion 2 = 

x£S 

= E p ^ + _ Vp(x)q(x){e t ' ) 



y^pjx) + q(x) - 2y/p(x)q(x) cosj x 



xes 



with 7a; = (fix — Xx- Simplifying by using the normalization condition 3.1 and 
Equation |3.10| we get: 

(3.12) \\m-\0W 2 = 2-2j2cos lx ^/p(x)q(x) = 2 (1 - » . 

xes 

3.1.3. Relation between Projection, no-name metric and Euclidian distance of the 
principal system. Let P = \tp) (ip\ — U^DU be the spectral decomposition of P, 
where D is a diagonal matrix with non-negative diagonal entries, and U is unitary. 
Clearly, (ip\ P \ip) = 1 and therefore, U \ip) = \k) is a basis vector in some appro- 
priate basis system and D = |fc) Define = U |£) = Ylj a 'j \ Then, with 
a'j = aj + ibj, a, b 6 R, 

(£| P |0 = (CI UtDU |0 = (?\k) (k\0 = a'*a' k = a\ + b\, 
or in terms of the no-name metric we get: 

nm,\t)f = \-a\-b\. 



The Euclidian distance computes to (see 3.12): 

l!l^)-|0ll 2 = Hui^-uioil'HII^-IO)!! 2 

= 2 (1 - K (0|fc)) = 2 (1 - 5Ra' fe ) = 2 (1 - a k ). 
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If we assume a real nonnegative principal system this might give some insight into 
distance functions used in the literature ( [TJ [221 E] ) - In this case as 6 = 0, < a < 1 
and 1-a < 1-a 2 we have (see Fig. 0: D(\ip),\0) < |||</>) - |£)|| < y/2D (\tf>) ,\$). 

3.2. Dropping the phase. The relation between the Euclidian distance of the 
principal system and the no-name metric becomes slightly easier (but misleading in 
general) when we consider no phase information, i.e. when p and q are represented 
by their special nonnegative and real choice \ip p ) and In this case the complex 
Hilbert space C n is reduced to a real Hilbert space of the same dimension and we 
may drop the modulus as all probabilities are mapped onto the positive square root. 
We can visualize the relation of the two measures as can be seen in Figure [T] 
The inner product (CqlVv) gives the cosine of the angle between \ip p ) and \£ q ) 

and equals the length of the projection of \^ q ) onto \ip P )- Likewise yl — (CglV'p) 2 
results into the sine of the angle and equals the length of the projection of onto 
the orthogonal complement of \ip p ) in the plane spanned by the two state vectors. 
The length of the difference of the two vectors in terms of the inner product is given 



by (Eq. 3.121 



(3-13) || |^p) - \U) II = v^/i-tah/g. 

This relates to a measure between density distributions frequently used in the lit- 
erature that is based on the so called Bhattacharyya coefficients: 



(3.14) d(p(X),q(X)) = d(p, q)= A - E Vp(xMx) = yJl-faWp), 

y xes 

which up to a constant factor is equal to (3.13). The sum over the Bhattacharyya 



coefficients, f(p(X),q(X)) = J2xes \Zpi^)oi^)j i s usually called the fidelity of the 



probability distribution p and q. As an example 3.14 is used in [6] to compare 
density functions in the context of object tracking. 

Let lu be the angle between \tp p ) and \S, q ). Substituting cosw = (Cq I V'p) ari d 
observing that sin ^ = ^j-y/l — cosw, we get: 

D(p(X),q(X)) = sinw 

\\\Tpp)-\Z q )\\ = V / 2d(p,(?) = 2sin|. 

For the rest of the paper we will limit our discussion onto two dimensional sys- 
tems as they allow to be visualized are fairly powerful instruments for concrete 
applications. 

4. Visualization of D, the two-dimensional case 

It is worthwhile to study the behavior of the transition probability and the no- 
name metric more closely. For the two-dimensional Hilbert space C 2 and pure states 



\tp) , |£) G C 2 we can visualize Equation 3.11 and consequently D(\ip) , |£}) 



Consider density distributions p, q over the random variable X = {x, y}. In this 
case the encoding of p and q as quantum states results in qubits. We want to 
analyze the impact of the phase factor on the distance of the qubits. 



Using Equation 3.11 Equation (3.9) then takes the form 

\2 



D{\^) > 10) = 1 - P{x)q{x) - p(y)q(y) - 2 cos(7 x - l y )^/p(x)p(y)q(x)q(y). 
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Figure 1. Geometric derivation of several distance measures in 
W 1 . The intersection of the plane spanned by \ip p ) and with 
the unit hypersphere of K™ generates a unit circle. 



Alternatively, we may express D (\ip) , in terms of p(x) = \a x \ 2 , p{y) = \cc y \ 2 , 
q(x) = \/3 x \ 2 and q(y) = \p y \ 2 : 

D m , |0) 2 = 1 - \a x \ 2 \P x \ 2 - \a y \ 2 \f3 y \ 2 - 2cos( 7;c - 7v)|a*ll"i/ILWyl- 

(1) Let p be fixed and q vary over all possible distributions. As p is fixed the 
phase difference j x — j y only depends on q and we may consider all possible 
phase differences. A visualization of D (\tp) , for several choices of p is 
given in Figure [2] 

(a) Clearly D (\tp) , = <^> a x = fi x , a y = f3 y as then j x - ^ y = and 

d (|v.) , I0) 2 = i - (Kl 2 + Kl 2 ) 2 = i - (Pi*) +p(y)) 2 = o. 

(b) Conversely, D (|^) , = 1 a x = |^| e *Cx-±* -Hi), 

a y = |/3 x |e*^ !1 ' , ' :F 5' + ' 7 ^, 77 G K, which gives "/ x — ^f y — ±7r, resulting in 

£> , I0) 2 = i - IA,rW - I&IW + 2|^| 2 |/?,| 2 = i. 

(2) Now let 

7x ~ 7y be fixed and y>, q vary over all possible distributions. A 
visualization of D , for several choices of 7^ — 7y is given in Figure 

13 

(a) j x - j v = 0: 

Dd^) AOf = 1 - p{x)q(x) - p(y)q(y) - 2^p(x)p(y)q(x)q(y) 

= 1 - (y/p(x)q{x) + \/p(y)q{y)j 

D(m,\0) = 0^p(x)=q(x) and 

D(W,\0) = l^p(x) = q(y)=0Vp(x) = q(y) = l. 

When p(x) = q(x) for all x E X we have a perfect correlation. 



Conjugate Variables 



p(x)=0 : p(y)=1 



p(x)=1/4;p(y)=1-p(x) 



q(x) 



p(x)=1/2;p(y)=1-p(x) 



q(x) 



p(x)=2/3;p(y)=1-p(x) 



q(x) 



q(x) 



FIGURE 2. Visualization of D , for X = {x,y} and differ- 
ent values of p. q ranges over all density distributions and all phase 
differences between and ir. The range from — it to is symmetric. 



(b) 7,-7, =±f: 

D (\i>)AOf = 1 -p(x)q(x) -p(y)q(y) - ^/ p{x)p(y)q(x)q(y) 
D(\ip) ,\0) = O^p(x) =q(x) =0Vp(x) =q{x) = I and 
D(\i>),\0) = l^p{x)=q{y)={)\Jp{x)=q{y) = \. 

(c) lx-ly = ±f : 

D{\4>)M)f = 1 - p(x)q(x) ~ p(y)q(y) 
D (\ip),\0) = O^-P(x) =q(x) =0Vp{x) =q(x) = 1 and 
D(W,\0) = l^p(x) = q(y) = OVp(x) = q(y) = l. 

(d) j x - j y = ±7r: 

£>(k/>), I0) 2 = l - p(x)?(s) - p(y)g(y) + 2v/?(i)p(!/)?(i)?(y) 

= i - (Vp^M 1 ) - Vp(y)q(y)) 

D (\i>),\0) = p(x) = q(x) = V p(x) = q{x) = 1 and 

D(|V>>IO) = l^p(a:) = l-g(a:). 

When p(x) = 1 — q(x) for all x € X we have a perfect anti correlation. 
Figure[4]shows D , if p = q and q is ranging over all phase differences 
between and n. 
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Figure 3. Visualization of D (\ip) , for X = {x, y} and a fixed 
phase difference, p and q range over all possible density distribu- 
tions. 



p(x)=q(x) p(y)=q(y) 




Figure 4. Visualization of D (\tp} , for X = {x,y} and p = 
q. The phase difference j x — -jy ranges over all phase differences 
between and n. The range from — n to is symmetric. 



4.1. Visualization of qubits, the Bloch sphere. A very useful visualization 
technique that is limited to two dimensional systems is given by the so called Bloch 
sphere. By ignoring a physically irrelevant overall phase factor, the general state of 
a qubit can be written as 

9 9 

(4.1) \iP) =cos-|0)+e 4¥, sin-|l) with ip G [0, 2tt), 6 & [0, n]. 
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If we interpret 9 and ip as spherical coordinates 

(4.2) f — (cos ip sin 9, sin ip sin 9, cos 9) = (x, y, z), 

every qubit state has a unique representation as a point on the three-dimensional 
unit sphere, also known as Block sphere (see Fig. [5]). 

The unit- vector r = fw, is called Block vector of \tji). Bloch vectors have the 
property that 

(4.3) r 4 = -f x <=► (4>\ X ) = 0. 

For density operators of a two-dimensional system a similar generalization exists. 
By again ignoring an overall phase, any single qubit density operator can be written 
as (see [IB] ) 

(4.4) p= l -( 1 + Z X - lV \. 

For mixed states, the Bloch vector r p = (x, y, z) lies inside the Bloch sphere 
(||r p || < 1) and for a classical unbiased coin flip state, i.e. p(|0)) = p(|l)) = 1/2 we 
get 

(4-5) Pc = ( I i ) ^r Pc = (0,0,0). 

Unfortunately, there is no easy generalization of the Bloch sphere for higher 
dimensional quantum systems. 

5. A New Approach to Signal Segmentation using Conjugate 
Information Variables 

The segmentation of signals is one of the fundamental problems in the area of 
signal and image processing. This is especially true as many higher-level signal 
analysis algorithms rely heavily on the result of a low-level segmentation process. 
The term segmentation is mostly not well defined as it can refer to finding some 
objects in an image, e.g. a face, skin, coin, etc., or detecting lower-level features 
like edges, constant signal areas, etc. 

In this chapter we suggest a new approach for a specific segmentation problem, 
namely the segmentation of signals into locally constant, rising or falling parts, 
and minima or maxima. For simplicity we assume one-dimensional signals but 
remark that the approach is not limited to this and can be extended to two- or 
higher-dimensional problems (see Fig. [6]). 

Let the signal be given by f(t) and let it be at least twice differentiable, that is 
/ (t) and / (t) exist. In our approach the deriviates will be used as as conjugate 
variables in order to classify local parts of functions. 

We need the range of the derivatives to be within certain limits to normalize 
them properly This can sometimes cause problems as we have to think about 
meaningful limits in a given application but usually the limits are obvious for real 
life signals, such as an image function where the pixel's range from zero to one, etc. 
The limits of the derivatives will be used as soft thresholds that control each other. 

Zero crossings of the first derivative indicate extrema of a function and the sign of 
the second dervative destincts between a maximum and a minimum. The function 
is constant or has a saddle point if both derivatives are zero. In the presence of 
noise zero crossings of the first derivative becomes unstable whereas the sign of the 
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second derivative remains robust unless there is a saddle point. We will expoit this 
fact and relate it the role amplitude and phase play for qubits. 
Let 

P ±(t) = / C2±/ " (<) 

a+(t) = 
a-(t) = p~(t), 

with \f'(t)\ < Cl ,\f"(t)\ <c 2 for all t, 

and define quantum states 

(5.1) \i>(t))=a+(t)\z+)+a-(t)\z-). 

Please note that as |a + (i)| 2 + |a~(i)| 2 = 1, for all t, the \i(>(i)) are well-defined 
qubits. \z + ) , describe the positive and negative z-axis of the Bloch sphere (see 
Fig. [5]) and / (t) = ±02 \ip(t)) = \z±) independent of / (t). The positive and 
negative x and y axis respectively are given by 

\*+) = rj§(l*+) + I*-) = ^(l*+> - 

|y+> =^(M+i|z_)), |y_> = ^00- *!*-»• 

/" (t) =/ (t) =0<=* m)) = \x+) whereas /" (t) = A /' (t) = ±d <=> | ^(t)> = 
|x_). / (t) and / (t) are now conjugate variables in the sense that observing the 
random variables X = {\x + ) (x + \ , \x_) (a;_|} and Z = {\z + ) (z + \ , \z-) {z-\} the 
commutatoij^] between any two projectors of X and Z becomes maximal: 

A± = \x ± ) (x ± \,B ± = \z ± ) (z±\ ||A±P± -B ± A ± \\ = ^ 
Next we define a set of projectors: 



_l + | a! _)< a; _l) = -L^--L(i + i)P 1 
^l + | a: _)< a; _l) = -L^--L(i-i)P 1 

Only Po an d Pi form a random variable as they add up to the identity and therefore 
fulfill the completeness condition. They indicate minimum or maximum of the 
function, P,-, i — 2, . . . , 4 would have to be complemented by I — P; in order to do 
so. The choice of projectors depends on the problem we want to solve. 



Po = 






Pi = 


\*-) 


(z- 


P2 = 


1 

71' 


[\y- 


P3 = 


1 

V2 { 


[\y+ 


Pi = 


\x+) 


(x + 



^The commutator of two operators A, B is given by [A, B] = AB—BA. Two operators commute 
if and only if their commutator equals zero, i.e. AB = BA [A, B] = 0. 
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f(t) 
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f"(t) 



f'(t) 



Figure 5 . Top left: Bloch sphere representation of all qubit states. 
Top right: Bloch sphere representation of the qubit states \ip(t)} 



for f(t) as given in Equation 5.2 Middle left: Labeling result in 



the Bloch sphere representation. Middle right: Labeling of f(t). 
Bottom left: Decision regions in the Bloch sphere representation, 
Pi is shining through at the north pole but is not visible from 
this perspective. Bottom middle: Decision regions over the joint 
distributio n of / (t) and / (t). Bottom right: Model based (see 
Equations 5.3 ) decision regions over the joint distribution of / (t) 
and / (t). The limits of the axes are ±ci for / (i) and ±C2 for 
/ (t). If c = C\ = &}, the radius of the inner circle, P4, is given by 
c/2. 



For any given f(t) we now can project the corresponding \4>(t)) and get a clas- 
sification result by maximizing over all projectors 

C(/(t))=iG{0,...,4}:(^)|Pi|V(t)) = .wax Mt)\Pim))- 

7=0,. ...4 
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Alternatively, if X, = {Pi, I - P} = {\xi,o) (Xi,o\ , |%i,i> (Xi.il}>* = °> ■ • ■ > 4 > and 
|x) represents the eigenvector of the corresponding projector, we can minimize the 
distance D(\xi,o) > IV'W))* The results of the classification for an analytic function 




Figure 6. An application of the segmentation technique. The 
original image on the top left was smoothed on the constant part 
P 4 of the image function as given in the bottom right picture. The 
result of smoothing adaptively with an average filter (15 x 15) is 
shown in the lower left picture (Po, . . . , P3 are not filtered). The top 
right shows the filter result using the same filter kernel independent 
of the image content. 



(5.2) fit) = 



cos(i), |t| < 2tt 
1, t > 2tt 

are given in middle row of Fig. [5j whereas the bottom row (left and middle picture) 
shows the decision boundaries over the entire joint distribution of the derivatives as 
derived by the described approach. Usually we cannot access an analytic expression 
for f(t) but a measurement of either f(t) or / (t) and / Jt) is available. If we 
measure f(t) we have to derive the derivatives numerically*! An example for the 
measurement of / (t) and / (t) is given by observing the velocity and acceleration 



^Depending on the application, this might require some low pass filtering of the measurements 
to gain stability in the derivatives. 
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of an object. In general these measurement will be taken by independent devices 
but they are highly correlated quantities. 

In either case the measurements will be subject to errors. We may choose differ- 
ent error models for the respective areas. In the example we gave in the introduction 
we could decide that the acoustic signal is mainly noise during the day time and 
the visual signal is not available at night time. Towards the poles of the Bloch 
sphere the phase term (in this case / (t)) becomes less important as it seamlessly 
transforms itself into a global phase factor. In equatorial regions it becomes pre- 
dominant. A quantitative discussion of this depends on the concrete problem to be 
solved, especially on the statistical error model(s) to be chosen, and shall be left 
for a future publication. 

A standard approach to derive the decision regions would be to create a model 
of the decision problem and than minimize some distance function over the entire 
joint signal distribution to this model as shown in the bottom right picture of Fig. [5] 
for the function given in Equation |5.2| To derive this model we have to notice the 
(partially) nonlinear dependency of 

/(t)=-sin(t), /"(t) = -cos(t), f'(tf + f"(t) 2 = l, \t\<2n, 
(5.3) /(t) = /"(t)= , t>2ir. 

In this model the decision depends on both signals over (almost) the entire joint 
distribution range, whereas the model derived from the conjugate variables cuts off 
the influence of one signal around the poles of the Bloch sphere. 



6. Discussion 

The approach of modelling joint distributions of signals as conjugate variables 
certainly has its limitations. We remark that its strengthes go along with its weak- 
nesses. Whenever all signals are meaningful over the entire joint distribution range 
other approaches might be more successful 

On the other hand in situations where we have an expectation that we can rely 
on some signals more than others, depending on some given (or derived) parame- 
ters, our approach will be valuable as this dependency may be modelled directly. 
It is noteworthy that the role of the two signals used for the encoding might be 
quite different. In the bird example we may classify the two species, lets call them 
Happy and Unhappy, using the acoustic signal alone at times but even when this 
signal becomes mixed we still can rely on its presence and then take the optical 
signal into account. The converse is not true as we can not speak about a colour 
at night. Therefore, the optical sensor (used as phase information) might give ar- 
bitrary measurement results (including blue or green!) at night time. This will not 
influence the classification result as the measurement of the conjugate acoustic sig- 
nal is mapped to one of the polar regions at night time. In lucky circumstances, e.g. 
when by chance we have no background noise during the day or the moon provides 
enough light at night, the joint observation of chirp and colour does not change 
the classification result cither, as a Happy bird does not become more Happy if we 
know both features. When the noise level is slowly increasing (at sunrise or sunset) 



In the Hilbcrt space setting used in this paper we need a higher dimensional system, e.g. a 
two qubit system. Of course, we may add conjugate information to each of those systems, if it is 
available to us. 
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seamlessly both signals contribute to the classification and allow a discrimination 
of the two classes. 

Just to give a flavour of the results to expect, Fig. [6] shows the application of 
an adaptive filter on image functions that is based on the segmentation approach. 
At the top left the original image (used as the signal /(f)) is given which is to be 
smoothed to remove noise introduced by the sensor (camera, scanner, etc.). If we 
assume uniform noise, we can remove this by averaging over small parts of the image 
function. The application of an average filter independent of the image structure 
will remove this noise but unfortunately it will remove parts of the image content 
at the same time as can be seen in the top right picture of Fig. [6] If we classify each 
point of the image according to the approach described in Section [5] we can apply 
the averaging process only to image areas that are labelled constant, i.e. P4 (Fig. [6] 
bottom right). This results in an image that is fairly smooth in these constant 
image areas but untouched everywhere else (Fig. [6] bottom left). In the other areas 
we could apply different filter kernels (and different error models) if we wish to do 
so. 

7. Conclusion and Future Work 

In this paper we have described a method for modelling joint distributions of 
signals as conjugate variables in finite-dimensional complex Hilbert spaces. We have 
derived a distance function based on the transition probability of quantum states 
that is a metric on pure states and satisfies the triangle inequality for mixed states, 
and we have related it to some distance functions between probability distributions. 
The analysis of other distance measures on quantum states, especially the so called 
trace distance which is closely related to the variational distance of probability 
distributions, will be a future task. 

We believe that the mathematical concept of quantum information theory of- 
fers a fertile resource in many areas of information processing. In this article we 
mainly focused on non-decomposed principle systems, but it would be very inter- 
esting to analyze the decomposition of higher-dimensional systems which remains 
a future task. We have not touched on the field of quantum operations, i.e unitary 
transformations of quantum states which again offers a rich potential. Addition- 
ally generalized measurement operators (positive operator valued measurements 
(POVM)) offer a valuable future research direction. The use of Wigner functions 
provides another promising research area. Wigner functions relate the probability 
distributions of conjugate variables to each other as they describe this joint (real 
valued) distribution. Moreover they relate the spatial to the frequency domain, 
which may have further impact. 
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